Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version

Select by Weights (Multi) (Operator Toolbox)

Synopsis

This operator allows you to select columns by a given criterion like Correlation

Description

This operator allows you to filter a given data set for the most dependend attributes. The dependency measure can be chosen with the parameter method.

This operator combines the functionality of several other operators like Weight by Correlation in combination with Select by Weights.

Unlike most other Weight by Operator, this operator also allows weights. You can use weights by setting the column containing the weights to the role 'weight.

Input

  • exa (Data Table)

    The table you want to filter.

Output

  • exa (Data Table)

    The filtered table.

  • ori (Data Table)

    The original table.

  • wei (Data Table)

    A table with the attribute names and their respective names. Incompatible Attributes contain a missing value for their weight.

Parameters

  • method The method you want to use to calculate the weights. Range:
  • filter_method Allows you to define if you want to choose the top k attributes or all attributes above a given threshold. Range:
  • k How many attributes should be selected. Only available if you choose top k as filter method. Range:
  • min_value The threshold which is needed to select a given attributes. Only available if you choose greater equals as filter method Range:
  • use_absolutes If set to true the operator will calculate absolute values of the given performance measure. Range:
  • keep_incompatible If set to true incomaptible attribute will remain in the result table. Incompatible attributes are for example nominal values for Pearson correlation, since Pearson correlation is not defined for nominal attributes or labels. Range:

Tutorial Processes

Predicting the Age of Titanic

In this example we predict the age of a passenger on the titanic. We do this by first using Select by Weights (multi) to reduce the numerical columns to 2 using pearson correlation. The operator is set to keep incompatible attributes, so the nominal attributes are kept.